Look at Who's Talking: Voice Activity Detection by Automated Gesture Analysis

نویسندگان

  • Marco Cristani
  • Anna Pesarin
  • Alessandro Vinciarelli
  • Marco Crocco
  • Vittorio Murino
چکیده

This paper proposes an approach for Voice Activity Detection (VAD) based on the automatic measurement of gesturing. The main motivation of the work is that gestures have been shown to be tightly correlated with speech, hence they can be considered a reliable evidence that a person is talking. The use of gestures rather than speech for performing VAD can be helpful in many situation (e.g., surveillance and monitoring in public spaces) where speech cannot be obtained for technical, legal or ethical issues. The results show that the gesturing measurement approach proposed in this work achieves, on a frame-by-frame basis, an accuracy of 71 percent in distinguishing between speech and non-speech.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combined Gesture-Speech Analysis and Synthesis

Multi-modal speech and speaker modelling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speech and lip motion as well as speech and facial expressions are widely studied, relatively little work has been done to investigate the correlations between speech and gesture. Detection and modelling of head, hand and...

متن کامل

Look Who's Talking: Speaker Detection using Video and Audio Correlation

The visual motion of the mouth and the corresponding audio data generated when a person speaks are highly correlated. This fact has been exploited for lip/speechreading and for improving speech recognition. We describe a method of automatically detecting a talking person (both spatially and temporally) using video and audio data from a single microphone. The audio-visual correlation is learned ...

متن کامل

A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)

Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...

متن کامل

Neural Network Performance Analysis for Real Time Hand Gesture Tracking Based on Hu Moment and Hybrid Features

This paper presents a comparison study between the multilayer perceptron (MLP) and radial basis function (RBF) neural networks with supervised learning and back propagation algorithm to track hand gestures. Both networks have two output classes which are hand and face. Skin is detected by a regional based algorithm in the image, and then networks are applied on video sequences frame by frame in...

متن کامل

Why Do You Ask So Many Questions ?

[N]o amount of telling somebody in mere words that one is or is not angry is the same as what one might tell them by gesture or tone of voice .... Anyhow, it is all nonsense. I mean, the notion that language is made of words is all nonsense -and when I said that gestures could not be translated into "mere words/' I was talking nonsense, because there is no such thlng as "mere words" ... Gregory...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011